DATA 1001 Project 2

Authors

Thieu Quang Khai | SID 540932335

University Of Sydney | DATA1001 | October 2024

1. Client Bio and Recommendation

Client:

Global Facility for Disaster Reduction and Recovery (GFDRR)

Bio:

Established in 2006, GFDRR is a multi-donor partnership that aids low and middle-income countries in managing and reducing risks from natural hazards and climate change. It provides financial support for technical assistance, invests in innovative solutions, and shares global knowledge to enhance disaster risk management and climate adaptation policies. Since 2015, GFDRR has mobilized approximately $35 billion for disaster resilience operations, focusing on vulnerable countries where support can yield significant impacts.

Recommendation:

GFDRR should prioritize expanding disaster resilience efforts in regions experiencing increasing impacts from natural disasters, especially in Asia, which has the highest number of affected people. Focused investments in early warning systems, community preparedness, and infrastructure reinforcement are essential to mitigate both human and economic losses.

2. Evidence

2.1 Initial Data Analysis (IDA)

  • For the purpose of this report, we will focus on the column Number.of.total.people.affected.by.’disaster type’ and Total.economic.damages.from.’disaster types’.
  • The data on economic losses and the number of people affected was scaled using a logarithmic transformation to manage high variance while keeping the linearity.
  • A random value between 0.001 and 10 was added to the data on the number of people affected by disasters to improve map visualization (due to the original data having many missing and zero values). This adjustment will not impact our final conclusions, as we are analyzing data on a scale of millions of people.
Code
# Loading necessary library
library(leaflet)
library(dplyr)
library(sf)
library(rnaturalearth)
library(tidyverse)
library(htmltools)

# Load data
data = read.csv("natural-disasters.csv")

# Add random values. Set seed to ensure reproduceability
set.seed(1)

data$Number.of.total.people.affected.by.earthquakes =
data$Number.of.total.people.affected.by.earthquakes + runif(nrow(data), min = 0.001, max = 10)

data$Number.of.total.people.affected.by.floods = data$Number.of.total.people.affected.by.floods + runif(nrow(data), min = 0.001, max = 10)

data$Number.of.total.people.affected.by.storms = data$Number.of.total.people.affected.by.storms + runif(nrow(data), min = 0.001, max = 10)

data$Number.of.total.people.affected.by.drought = data$Number.of.total.people.affected.by.drought + runif(nrow(data), min = 0.001, max = 10)

data$Number.of.total.people.affected.by.disasters = data$Number.of.total.people.affected.by.disasters + runif(nrow(data), min = 0.001, max = 10)

# Change name of some countries to match the name in the map.
data <- data %>%
  mutate(Entity = case_when(
    Entity == "United States" ~ "United States of America",
    Entity == "South Sudan" ~ "S. Sudan",
    Entity == "Democratic Republic of Congo" ~ "Dem. Rep. Congo",
    Entity == "Central African Republic" ~ "Central African Rep.",
    Entity == "Eswatini" ~ "eSwatini",
    Entity == "Bosnia and Herzegovina" ~ "Bosnia and Herz.",
    Entity == "Dominican Republic" ~ "Dominican Rep.",
    Entity == "Cote d'Ivoire" ~ "Côte d'Ivoire",
    TRUE ~ Entity  # Retain original name for others
  ))

# view(data)

2.2 Trend of some common disasters over time

The line graph shows an overall upward trend in the number of people affected by disasters over time, signaling an ongoing need for improved disaster risk management. Although there is a recent dip, historical data suggests that more frequent and severe disasters may occur in the future (Wallemacq et al., 2015).

Code
# Summarize the total people affected by each disaster type by year

flood_affected_by_year <- data %>%
  group_by(Year) %>%
  summarize(total_affected = sum(Number.of.total.people.affected.by.floods, na.rm = TRUE))

storm_affected_by_year <- data %>%
  group_by(Year) %>%
  summarize(total_affected = sum(Number.of.total.people.affected.by.storms, na.rm = TRUE))

earthquake_affected_by_year <- data %>%
  group_by(Year) %>%
  summarize(total_affected = sum(Number.of.total.people.affected.by.earthquakes, na.rm = TRUE))

drought_affected_by_year <- data %>%
  group_by(Year) %>%
  summarize(total_affected = sum(Number.of.total.people.affected.by.drought, na.rm = TRUE))

disaster_affected_by_year <- data %>%
  group_by(Year) %>%
  summarize(total_affected = sum(Number.of.total.people.affected.by.disasters, na.rm = TRUE))

# Combine all data into one dataframe
combined_data <- 
  flood_affected_by_year %>% rename(Floods = total_affected) %>%
  left_join(storm_affected_by_year %>% rename(Storms = total_affected), by = "Year") %>%
  left_join(earthquake_affected_by_year %>% rename(Earthquakes = total_affected), by = "Year") %>%
  left_join(drought_affected_by_year %>% rename(Droughts = total_affected), by = "Year") %>%
  left_join(disaster_affected_by_year %>% rename(Disasters = total_affected), by = "Year")

# Create a combined line graph
ggplot(combined_data, aes(x = Year)) +
  geom_line(aes(y = Floods, color = "Floods"), size = 0.8) +
  geom_point(aes(y = Floods, color = "Floods"), size = 2) +
  geom_line(aes(y = Storms, color = "Storms"), size = 0.8) +
  geom_point(aes(y = Storms, color = "Storms"), size = 2) +
  geom_line(aes(y = Earthquakes, color = "Earthquakes"), size = 0.8) +
  geom_point(aes(y = Earthquakes, color = "Earthquakes"), size = 2) +
  geom_line(aes(y = Droughts, color = "Droughts"), size = 0.8) +
  geom_point(aes(y = Droughts, color = "Droughts"), size = 2) +
  geom_line(aes(y = Disasters, color = "Disasters (General)"), size = 0.8) +
  geom_point(aes(y = Disasters, color = "Disasters (General)"), size = 2) +
  labs(title = "Total People Affected by Different Disasters Over Time",
       x = "Year",
       y = "Total Number of People Affected") +
  scale_color_manual(values = c("Floods" = "#007BFF", 
                                  "Storms" = "#FF5733", 
                                  "Earthquakes" = "#28A745", 
                                  "Droughts" = "#FFC107", 
                                  "Disasters (General)" = "#6F42C1")) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        panel.grid.major = element_line(color = "lightgrey"),
        panel.grid.minor = element_blank()) +
  scale_y_continuous(labels = scales::comma) +
  labs(color = "Disaster Type")

2.3 Linear Regression

To provide evidence for the recommendation, a linear regression model was applied to investigate the correlation between the number of people affected by disasters and the total economic damages caused by these disasters. The analysis revealed a moderate positive correlation (r = 0.512), with a p-value smaller than 2.2e-16, indicating that the relationship between the number of people affected by disasters and the economic damage is statistically significant.

This correlation suggests that as more people are affected by disasters, economic damages increase proportionately. Thus, prioritizing investments in disaster preparedness and mitigation in regions with high numbers of people affected could reduce both human casualties and the financial burdens associated with recovery efforts (Botzen et al., 2019).

Code
# Select relevant columns
df_analysis <- data %>%
  select(
    Number.of.total.people.affected.by.disasters,
    Total.economic.damages.from.disasters
  ) %>%
  drop_na()  # Remove rows with NA values

# Log-transform the data to reduce variance
df_analysis_log <- df_analysis %>%
  mutate(
    log_people_affected = log(Number.of.total.people.affected.by.disasters),
    log_economic_damages = log(Total.economic.damages.from.disasters)
  )

# Filter data to not include random values that we added
data_filtered_log = subset(df_analysis_log, 
log_people_affected > log(10) & log_economic_damages > log(10))


# Create a scatter plot with log-transformed values
ggplot(data_filtered_log, aes(x = log_people_affected, y = log_economic_damages)) +
  geom_point(color = "darkseagreen") +  # Points on the scatter plot
  geom_smooth(method = "lm", se = TRUE, color = "black") +  # Linear regression line
  labs(title = "Log-Scaled Correlation between People Affected by Disasters and Economic Impact",
       x = "Total Number of People Affected",
       y = "Total Economic Damages") +
  theme_minimal() 

Code
# Find correlation
cor(data_filtered_log$log_economic_damages, data_filtered_log$log_people_affected)
[1] 0.5117047

2.4 Map Of Some Common Disasters

The map reveals that Asia is disproportionately affected by disasters, particularly by floods, droughts, earthquakes, and storms. This region should be prioritized due to its vulnerability and the potential high impact of mitigation efforts in protecting lives and reducing damage.

Code
# Define function to calculate number of people affected based on disaster type
calculate_people_affected <- function(data, disaster_type) {
  affected_column <- paste0("Number.of.total.people.affected.by.", disaster_type)
  
  summary_data <- data %>%
    group_by(Entity) %>%
    summarise(affected_people = mean(get(affected_column), na.rm = TRUE))
  
  #view(summary_data)
  return(summary_data)
}


# Define function to create map
create_map <- function(disaster_type){
  
  # Calculate affected people for the selected disaster type
  affected_data <- calculate_people_affected(data, disaster_type)
  
  #view(affected_data)
  
  # Round up floor value
  affected_data <- affected_data %>%
  mutate(across(where(is.numeric), ceiling))
  
  # Load world shapefile
  world <- st_as_sf(ne_countries(scale = "medium", returnclass = "sf"))
  
  # Join death data with world map
  map_data <- left_join(world, affected_data, by = c("name" = "Entity"))

  # Make color palette
  medium_green_to_blue_palette <- colorRampPalette(c("#A8E6CF", "#5BC0BE", "#4EA8DE", "#2196F3"))(8)
  
  # Define breaks for 8 intervals based on the affected_people values
  breaks <- unique(quantile(map_data$affected_people, 
                           probs = seq(0, 1, length.out = 9), 
                           na.rm = TRUE))
  
  # Create the interactive map
  m <- leaflet(map_data) %>%
    addTiles("https://{s}.basemaps.cartocdn.com/rastertiles/voyager/{z}/{x}/{y}{r}.png",
             options = tileOptions(minZoom = 2, maxZoom = 256)) %>%
    
    setView(lng = 0, lat = 20, zoom = 2) %>%
    
    # Set maximum bounds for the map (this prevents dragging outside the world)
    setMaxBounds(lng1 = -180, lat1 = -90, lng2 = 180, lat2 = 90) %>%
    
    addPolygons(
      fillColor = ~colorBin(palette = medium_green_to_blue_palette, domain = map_data$affected_people, bins = breaks)(affected_people),
                weight = 1,
                opacity = 1,
                color = "white",
                dashArray = "1",
                fillOpacity = 0.7,
                highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE),
                label = ~paste(name, ": ", affected_people)) %>%
    
    addLegend(position = "bottomright", 
              pal = colorQuantile(medium_green_to_blue_palette, NULL, n = 8),
              values = ~affected_people,
              title = paste("Total people affected by", disaster_type),
              opacity = 1,
              na.label = "No Data",
              labFormat = labelFormat(prefix = " ", suffix = " ", between = "  ")
    ) %>%
    
    # Fix the box for NA data that incorrectly displayed
    htmlwidgets::onRender("function(el) { 
      el.style.backgroundColor = '#FFFFFF'; 
      el.style.color = '#000000'; 
    }")
    
  css_fix <- "div.info.legend.leaflet-control br {clear: both;}" 
  html_fix <- htmltools::tags$style(type = "text/css", css_fix)
  m <- htmlwidgets::prependContent(m, html_fix)
  
  return(m)
}
Code
create_map("floods")
Code
create_map("drought")
Code
create_map("storms")
Code
create_map("earthquakes")
Code
create_map("disasters")

2.5 Hypothesis Testing

To test the claim: “There is a correlation between economic impacts and the number of people affected by natural disasters,” a hypothesis-testing framework was applied. This analysis resulted in a p-value < 0.05, indicating a statistically significant relationship between the number of people affected and economic damages. This suggests that GFDRR should focus more on regions with high populations and also vulnerable to disasters like Asia.

3. Appendix: Defense of Approach

3.1 Client Choice

As a Vietnamese student, I selected the GFDRR due to their vital support for Asian countries, especially Vietnam, impacted by disasters like the recent super typhoon Yagi. This choice influenced my report to focus on the economic effects of such disasters in the region.

3.2 Statistical Analysis

Code
# Make 
model = lm(log_people_affected ~ log_economic_damages, data = data_filtered_log)
ggplot(model, aes(x = .fitted, 
                  y = .resid)) +
  geom_point(color = "darkseagreen") +
  geom_hline(yintercept = 0, 
             linetype = "dotted", 
             colour = "red") +
  labs(title = "Residual Plot for People Affected by Disasters and Economic Impact",
       x = "Fitted Values",
       y = "Residuals")

Value points are randomly scattered around the horizontal line, demonstrating homoscedasticity.

  1. Hypothesis
  • Null Hypothesis (H₀): There is no correlation between the number of people affected by disasters and the economic impact.
  • Alternative Hypothesis (H₁): There is a correlation between the number of people affected by disasters and the economic impact.
  1. Assumptions
  • Independence: We assume the samples are independence.
  • Homoscedasticity: Residuals plot seems random.
  • Normality: QQ plot suggest normal distribution, although some deviations are evident at the extremes
Code
# Make QQ plot
ggplot(data_filtered_log, aes(sample = resid(model))) +
  stat_qq() +
  stat_qq_line(col = "red") +
  labs(title = "Q-Q Plot of Residuals (Economic Impact vs People Affected)", x = "Theoretical Quantiles", y = "Sample Quantiles") + theme_minimal()

Code
summary(model)

Call:
lm(formula = log_people_affected ~ log_economic_damages, data = data_filtered_log)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.2433 -1.9433  0.2005  2.1132  6.3337 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           4.93607    0.36560   13.50   <2e-16 ***
log_economic_damages  0.55005    0.03423   16.07   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.895 on 728 degrees of freedom
Multiple R-squared:  0.2618,    Adjusted R-squared:  0.2608 
F-statistic: 258.2 on 1 and 728 DF,  p-value: < 2.2e-16
  1. Test-statistic: t = 16.07

  2. P-Value: p-value: < 2.2e-16

  3. Conclusion: Since p-value < 0.05, we reject H0 and accept H1. This means there is a moderate positive relation between the number of people affected by disasters and the economic impacts.

3.3 Limitations

  • There was a lot of missing data.
  • Some impossible data like floating number for people affected.
  • The analysis only shows a relationship, not a trend since we can’t control natural disasters.

4. Ethics Statement

4.1 Shared Value: Respect for Individuals and Communities

I upheld the value of respect for individuals and communities by prioritizing privacy in my data handling and focusing on the effects of natural disasters on vulnerable populations. By emphasizing the human aspect of the data, I aimed to highlight the importance of informed decision-making in disaster response and recovery, ensuring that the voices of affected communities were considered in my analysis.

4.2 Ethical Principle: Pursuing Objectivity

I ensured objectivity in my analysis by using robust statistical methods and relevant data to provide accurate results. I presented all findings transparently, including limitations, to prevent misinterpretation. When results contradicted expectations, I communicated them clearly to emphasize their significance.

5. Acknowledgements

  • Wallemacq, P., Guha-Sapir, D., McClean, D., & Unisdr. (2015). The Human Cost of Natural Disasters - A global perspective. ResearchGate. https://www.researchgate.net/publication/317645955_The_Human_Cost_of_Natural_Disasters_-_A_global_perspective

  • Botzen, W. J. W., Deschenes, O., & Sanders, M. (2019). The Economic Impacts of Natural Disasters: A Review of Models and Empirical Studies. Review of Environmental Economics and Policy, 13(2), 167–188. https://ideas.repec.org/a/oup/renvpo/v13y2019i2p167-188..html

  • Learn to add color to map: https://github.com/Leaflet/Leaflet/discussions/9000

  • Find map style: https://github.com/CartoDB/basemap-styles

  • Fix NA color square not at correct place: https://github.com/rstudio/leaflet/issues/615

  • Learn how to do tabset: https://freyasystems.com/how-to-create-tabsets-using-quarto-r/

  • Learn to filter, rename data: https://edstem.org/au/courses/16787/discussion/2285233

  • Learn about headers: https://zsmith27.github.io/rmarkdown_crash-course/lesson-4-yaml-headers.html